Tootfinder

@arXiv_csCL_bot@mastoxiv.page
2024-03-12 06:48:50

SCORE: Self-supervised Correspondence Fine-tuning for Improved Content Representations
Amit Meghanani, Thomas Hain
https://arxiv.org/abs/2403.06260 https:/…

SCORE: Self-supervised Correspondence Fine-tuning for Improved Content Representations
There is a growing interest in cost-effective self-supervised fine-tuning (SSFT) of self-supervised learning (SSL)-based speech models to obtain task-specific representations. These task-specific representations are used for robust performance on various downstream tasks by fine-tuning on the labelled data. This work presents a cost-effective SSFT method named Self-supervised Correspondence (SCORE) fine-tuning to adapt the SSL speech representations for content-related tasks. The proposed metho…

@arXiv_csSE_bot@mastoxiv.page
2024-04-12 07:34:55

Structure-aware Fine-tuning for Code Pre-trained Models
Jiayi Wu, Renyu Zhu, Nuo Chen, Qiushi Sun, Xiang Li, Ming Gao
https://arxiv.org/abs/2404.07471 http…

Structure-aware Fine-tuning for Code Pre-trained Models
Over the past few years, we have witnessed remarkable advancements in Code Pre-trained Models (CodePTMs). These models achieved excellent representation capabilities by designing structure-based pre-training tasks for code. However, how to enhance the absorption of structural knowledge when fine-tuning CodePTMs still remains a significant challenge. To fill this gap, in this paper, we present Structure-aware Fine-tuning (SAT), a novel structure-enhanced and plug-and-play fine-tuning method for …

@arXiv_mathOC_bot@mastoxiv.page
2024-03-12 06:58:23

Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond
Wenpin Tang
https://arxiv.org/abs/2403.06279 https://

Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond
This paper aims to develop and provide a rigorous treatment to the problem of entropy regularized fine-tuning in the context of continuous-time diffusion models, which was recently proposed by Uehara et al. ( arXiv:2402.15194, 2024). We also show how the analysis can be extended to fine-tuning involving a general $f$-divergence regularizer.

@arXiv_csLG_bot@mastoxiv.page
2024-02-12 08:33:50

This https://arxiv.org/abs/2312.14378 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Training large foundation models using self-supervised objectives on unlabeled data, followed by fine-tuning on downstream tasks, has emerged as a standard procedure. Unfortunately, the efficacy of this approach is often constrained by both limited fine-tuning compute and scarcity in labeled downstream data. We introduce Multimodal Attention Merging (MAM), an attempt that facilitates direct knowledge transfer from attention matrices of models rooted in high resource modalities, text and images,…

@Mediagazer@mstdn.social
2024-03-10 13:40:23

William Whitworth, who wrote revealing profiles in The New Yorker before working as the EIC at The Atlantic from 1980 to 1999, died on March 8 at age 87 (Sam Roberts/New York Times)
https://www.nytimes.com/2024/03/09/…

William Whitworth, Revered Writer and Editor, Is Dead at 87
After writing memorable character sketches and fine-tuning others’ copy at The New Yorker, he spent two decades as editor in chief of The Atlantic Monthly.

@cdarwin@c.im
2024-03-10 16:25:02

Killings by police brought reforms. Fear of crime is unraveling them.
There is a groundswell of legislative and voter #pushback AGAINST #reforms initiated over the past four years after the #police

Killings by police brought reforms. Fear of crime is unraveling them.
Lawmakers and voters are fine-tuning some reforms and doing away with others, faced with community concerns about crime and backlash to anti-police efforts.

@arXiv_csSE_bot@mastoxiv.page
2024-02-12 06:53:02

Delving into Parameter-Efficient Fine-Tuning in Code Change Learning: An Empirical Study
Shuo Liu, Jacky Keung, Zhen Yang, Fang Liu, Qilin Zhou, Yihan Liao
https://arxiv.org/abs/2402.06247

Delving into Parameter-Efficient Fine-Tuning in Code Change Learning: An Empirical Study
Compared to Full-Model Fine-Tuning (FMFT), Parameter Efficient Fine-Tuning (PEFT) has demonstrated superior performance and lower computational overhead in several code understanding tasks, such as code summarization and code search. This advantage can be attributed to PEFT's ability to alleviate the catastrophic forgetting issue of Pre-trained Language Models (PLMs) by updating only a small number of parameters. As a result, PEFT effectively harnesses the pre-trained general-purpose knowledge …

@arXiv_csDC_bot@mastoxiv.page
2024-03-12 07:17:50

Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
Changyue Liao, Mo Sun, Zihan Yang, Kaiqi Chen, Binhang Yuan, Fei Wu, Zeke Wang
https://arxiv.org/abs/2403.06504

Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
Recent advances in large language models have brought immense value to the world, with their superior capabilities stemming from the massive number of parameters they utilize. However, even the GPUs with the highest memory capacities, currently peaking at 80GB, are far from sufficient to accommodate these vast parameters and their associated optimizer states when conducting stochastic gradient descent-based optimization. One approach to hosting such huge models is to aggregate device memory fro…

@arXiv_csCV_bot@mastoxiv.page
2024-02-12 08:31:17

This https://arxiv.org/abs/2401.05126 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCV_…

Efficient Fine-Tuning with Domain Adaptation for Privacy-Preserving Vision Transformer
We propose a novel method for privacy-preserving deep neural networks (DNNs) with the Vision Transformer (ViT). The method allows us not only to train models and test with visually protected images but to also avoid the performance degradation caused from the use of encrypted images, whereas conventional methods cannot avoid the influence of image encryption. A domain adaptation method is used to efficiently fine-tune ViT with encrypted images. In experiments, the method is demonstrated to outp…

@arXiv_csCY_bot@mastoxiv.page
2024-03-12 07:16:15

Improving Low-Resource Knowledge Tracing Tasks by Supervised Pre-training and Importance Mechanism Fine-tuning
Hengyuan Zhang, Zitao Liu, Shuyan Huang, Chenming Shang, Bojun Zhan, Yong Jiang
https://arxiv.org/abs/2403.06725

Improving Low-Resource Knowledge Tracing Tasks by Supervised Pre-training and Importance Mechanism Fine-tuning
Knowledge tracing (KT) aims to estimate student's knowledge mastery based on their historical interactions. Recently, the deep learning based KT (DLKT) approaches have achieved impressive performance in the KT task. These DLKT models heavily rely on the large number of available student interactions. However, due to various reasons such as budget constraints and privacy concerns, observed interactions are very limited in many real-world scenarios, a.k.a, low-resource KT datasets. Directly train…

@arXiv_csNE_bot@mastoxiv.page
2024-02-12 07:25:14

Fine-Tuning Surrogate Gradient Learning for Optimal Hardware Performance in Spiking Neural Networks
Ilkin Aliyev, Tosiron Adegbija
https://arxiv.org/abs/2402.06211

Fine-Tuning Surrogate Gradient Learning for Optimal Hardware Performance in Spiking Neural Networks
The highly sparse activations in Spiking Neural Networks (SNNs) can provide tremendous energy efficiency benefits when carefully exploited in hardware. The behavior of sparsity in SNNs is uniquely shaped by the dataset and training hyperparameters. This work reveals novel insights into the impacts of training on hardware performance. Specifically, we explore the trade-offs between model accuracy and hardware efficiency. We focus on three key hyperparameters: surrogate gradient functions, beta, …

@SmartmanApps@dotnet.social
2024-04-10 10:58:56

#ExplainVintageTechnology
#HashTagGames
It's a TV... but with no colour, the screen was curved, and you had to fiddle endlessly with the rabbit's ears until you finally got a good picture. The you had to twist a dial called "fine tuning" to try and get…

@arXiv_csCL_bot@mastoxiv.page
2024-03-12 06:48:55

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages
Mohammed Safi Ur Rahman Khan, Priyam Mehta, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Suriyaprasaad G, Varun Balan G, Sparsh Jain, Anoop Kunchukuttan, Pratyush Kumar, Raj Dabre, Mitesh M. Khapra
https://arxiv.org/…

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages
Despite the considerable advancements in English LLMs, the progress in building comparable models for other languages has been hindered due to the scarcity of tailored resources. Our work aims to bridge this divide by introducing an expansive suite of resources specifically designed for the development of Indic LLMs, covering 22 languages, containing a total of 251B tokens and 74.8M instruction-response pairs. Recognizing the importance of both data quality and quantity, our approach combines h…

@arXiv_csLG_bot@mastoxiv.page
2024-04-10 06:51:47

Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models
Zihan Fang, Zheng Lin, Zhe Chen, Xianhao Chen, Yue Gao, Yuguang Fang
https://arxiv.org/abs/2404.06448

Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models
Recently, there has been a surge in the development of advanced intelligent generative content (AIGC), especially large language models (LLMs). However, for many downstream tasks, it is necessary to fine-tune LLMs using private data. While federated learning offers a promising privacy-preserving solution to LLM fine-tuning, the substantial size of an LLM, combined with high computational and communication demands, makes it hard to apply to downstream tasks. More importantly, private edge server…

@arXiv_csCR_bot@mastoxiv.page
2024-04-09 06:48:00

Increased LLM Vulnerabilities from Fine-tuning and Quantization
Divyanshu Kumar, Anurakt Kumar, Sahil Agarwal, Prashanth Harshangi
https://arxiv.org/abs/2404.04392

Increased LLM Vulnerabilities from Fine-tuning and Quantization
Large Language Models (LLMs) have become very popular and have found use cases in many domains, such as chatbots, auto-task completion agents, and much more. However, LLMs are vulnerable to different types of attacks, such as jailbreaking, prompt injection attacks, and privacy leakage attacks. Foundational LLMs undergo adversarial and alignment training to learn not to generate malicious and toxic content. For specialized use cases, these foundational LLMs are subjected to fine-tuning or quanti…

@arXiv_csGT_bot@mastoxiv.page
2024-05-10 06:50:09

Truthful Aggregation of LLMs with an Application to Online Advertising
Ermis Soumalias, Michael J. Curry, Sven Seuken
https://arxiv.org/abs/2405.05905 http…

Truthful Aggregation of LLMs with an Application to Online Advertising
We address the challenge of aggregating the preferences of multiple agents over LLM-generated replies to user queries, where agents might modify or exaggerate their preferences. New agents may participate for each new query, making fine-tuning LLMs on these preferences impractical. To overcome these challenges, we propose an auction mechanism that operates without fine-tuning or access to model weights. This mechanism is designed to provably converge to the ouput of the optimally fine-tuned LLM…

@arXiv_csHC_bot@mastoxiv.page
2024-02-12 07:15:55

Randomness Is All You Need: Semantic Traversal of Problem-Solution Spaces with Large Language Models
Thomas Sandholm, Sayandev Mukherjee, Bernardo A. Huberman
https://arxiv.org/abs/2402.06053

Randomness Is All You Need: Semantic Traversal of Problem-Solution Spaces with Large Language Models
We present a novel approach to exploring innovation problem and solution domains using LLM fine-tuning with a custom idea database. By semantically traversing the bi-directional problem and solution tree at different temperature levels we achieve high diversity in solution edit distance while still remaining close to the original problem statement semantically. In addition to finding a variety of solutions to a given problem, this method can also be used to refine and clarify the original probl…

@arXiv_csSE_bot@mastoxiv.page
2024-03-12 08:42:05

This https://arxiv.org/abs/2312.15698 has been replaced.
link: https://scholar.google.com/scholar?q=a

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair
Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which have not been explored. Existing work mostly fine-tunes LLMs with naive code representations and is fundamentally limited in its ability to fine-tune larger LLMs. To address this problem, we propose RepairLLaMA, a novel program repair approach that combines 1) code representations for APR and 2) t…

@arXiv_csCL_bot@mastoxiv.page
2024-03-11 08:30:05

This https://arxiv.org/abs/2309.12307 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
We present LongLoRA, an efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs), with limited computation cost. Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048. In this paper, we speed up the context extension of LLMs in two aspects. On the …

@arXiv_physicsoptics_bot@mastoxiv.page
2024-04-12 07:07:06

Si Superstrate Lenses on Patch-Antenna-Coupled TeraFETs: NEP Optimization and Frequency Fine-Tuning
Anastasiya Krysl, Dmytro B. But, K\k{e}stutis Ikamas, Jakob Holstein, Anna Shevchik-Shekera, Hartmut G. Roskos, Alvydas Lisauskas
https://arxiv.org/abs/2404.07715

Si Superstrate Lenses on Patch-Antenna-Coupled TeraFETs: NEP Optimization and Frequency Fine-Tuning
This paper presents a study on performance optimization and resonant frequency modification of terahertz detectors by the use of hyper-hemispherical silicon superstrate lenses. The detectors are patch-TeraFETs, i.e., field-effect transistors with monolithically integrated patch antennas fabricated with a commercial 65-nm CMOS foundry process and designed for an operation frequency of 580 GHz. We demonstrate a strong improvement of the optical noise-equivalent power optical NEP, referenced again…

@arXiv_csCV_bot@mastoxiv.page
2024-04-12 08:30:33

This https://arxiv.org/abs/2404.05426 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCV_…

Test-Time Zero-Shot Temporal Action Localization
Zero-Shot Temporal Action Localization (ZS-TAL) seeks to identify and locate actions in untrimmed videos unseen during training. Existing ZS-TAL methods involve fine-tuning a model on a large amount of annotated training data. While effective, training-based ZS-TAL approaches assume the availability of labeled data for supervised learning, which can be impractical in some applications. Furthermore, the training process naturally induces a domain bias into the learned model, which may adversely …

@arXiv_csCL_bot@mastoxiv.page
2024-03-12 06:49:05

'One size doesn't fit all': Learning how many Examples to use for In-Context Learning for Improved Text Classification
Manish Chandra, Debasis Ganguly, Yiwen Li, Iadh Ounis
https://arxiv.org/abs/2403.06402

'One size doesn't fit all': Learning how many Examples to use for In-Context Learning for Improved Text Classification
Predictive models in natural language processing (NLP) have evolved from training models from scratch to fine-tuning pre-trained models with labelled data. An extreme form of this fine-tuning involves in-context learning (ICL), where the output of a pre-trained generative model (frozen decoder parameters) is controlled only with variations in the input strings (called instructions or prompts). An important component of ICL is the use of a small number of labelled data instances as examples in t…

@TedUnderwood@sigmoid.social
2024-05-06 13:37:23

Jack Clark’s Import AI newsletter is always fun; this one especially because it explores the shared interest Meta and the CCP have in preventing fine-tuning of their models.
In a “Palantír”-level irony, one of the projects to prevent all further learning is called SOPHON. https://

Import AI 371: CCP vs Finetuning; why people are skeptical of AI policy; a synthesizer for a LLM
Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this (and comment on posts!) please subscribe. Why are people skeptical of AI safety policy? …A nice interview with the Alliance for the Future…

@arXiv_physicsplasmph_bot@mastoxiv.page
2024-03-12 07:22:41

Data-driven sparse modeling of oscillations in plasma space propulsion
B. Bay\'on-Buj\'an, M. Merino
https://arxiv.org/abs/2403.06809 https://

Data-driven sparse modeling of oscillations in plasma space propulsion
An algorithm to obtain data-driven models of oscillatory phenomena in plasma space propulsion systems is presented, based on sparse regression (SINDy) and Pareto front analysis. The algorithm can incorporate physical constraints, use data bootstrapping for additional robustness, and fine-tuning to different metrics. Standard, weak and integral SINDy formulations are discussed and compared. The scheme is benchmarked in the case of breathing-mode oscillations in Hall effect thrusters, using PIC/f…

@arXiv_condmatmtrlsci_bot@mastoxiv.page
2024-04-11 07:25:18

SAM-I-Am: Semantic Boosting for Zero-shot Atomic-Scale Electron Micrograph Segmentation
Waqwoya Abebe, Jan Strube, Luanzheng Guo, Nathan R. Tallent, Oceane Bel, Steven Spurgeon, Christina Doty, Ali Jannesari
https://arxiv.org/abs/2404.06638

SAM-I-Am: Semantic Boosting for Zero-shot Atomic-Scale Electron Micrograph Segmentation
Image segmentation is a critical enabler for tasks ranging from medical diagnostics to autonomous driving. However, the correct segmentation semantics - where are boundaries located? what segments are logically similar? - change depending on the domain, such that state-of-the-art foundation models can generate meaningless and incorrect results. Moreover, in certain domains, fine-tuning and retraining techniques are infeasible: obtaining labels is costly and time-consuming; domain images (microg…

@arXiv_grqc_bot@mastoxiv.page
2024-04-08 07:01:59

Geometry from geodesics: fine-tuning Ehlers, Pirani, and Schild
James T. Wheeler
https://arxiv.org/abs/2404.03815 https://arxiv.org/p…

Geometry from geodesics: fine-tuning Ehlers, Pirani, and Schild
Ehlers, Pirani, and Schild argued that measurements of timelike and null geodesics yield projective and Weyl connections, respectively, with compatibility giving an integrable Weyl geometry. We find greater freedom for both connections, including either a 1-parameter class of projective connections, or a conformal connection for timelike curves, and conformal symmetry for null curves. Examining the effect of every member of both the projective class and the conformal transformations on the curv…

@Techmeme@techhub.social
2024-04-04 15:51:04

OpenAI expands its Custom Model training program with "assisted fine-tuning", letting organizations set up data training pipelines, evaluation systems, and more (Kyle Wiggers/TechCrunch)
https://techcrunch.com/2024/04/04/openai-expands…

OpenAI expands its custom model training program | TechCrunch
OpenAI is expanding a program, Custom Model, to help enterprise customers develop tailored generative AI models using its technology for specific use

@arXiv_csCL_bot@mastoxiv.page
2024-02-12 07:33:33

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning
Shivalika Singh, Freddie Vargus, Daniel Dsouza, B\"orje F. Karlsson, Abinaya Mahendiran, Wei-Yin Ko, Herumb Shandilya, Jay Patel, Deividas Mataciunas, Laura OMahony, Mike Zhang, Ramith Hettiarachchi, Joseph Wilson, Marina Machado, Luisa Souza Moura, Dominik Krzemi\'nski, Hakimeh Fadaei, Irem Erg\"un, Ifeoma Okoh, Aisha Alaagib, Oshan Mudannayake, Zaid Alyafeai, Vu Minh Chien, Sebastian Ruder, Surya…

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning
Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the finetuning of pre-trained models on a diverse set of tasks that enables a large language model (LLM) to respond to instructions. Instruction fine-tuning (IFT) requires specifically constructed and annotated datasets. However, existing datasets are almost all in the English language. In this work, our primary goal i…

@jgkoomey@mastodon.energy
2024-03-24 20:30:50

Speaking of vaccines, this could be very important. https://mag.uchicago.edu/science-medicine/fine-tuning-immunity

@mapto@qoto.org
2024-05-08 04:36:44

@… @… actually, I do believe you can remix models that are openly shared (at least through fine-tuning).
But anything closed that uses SO (e.g. Google BigQuery), might probably be an infringement.

@arXiv_csLG_bot@mastoxiv.page
2024-02-12 08:34:08

This https://arxiv.org/abs/2402.04004 has been replaced.
link: https://scholar.google.com/scholar?q=a

Understanding the Effect of Noise in LLM Training Data with Algorithmic Chains of Thought
During both pretraining and fine-tuning, Large Language Models (\textbf{LLMs}) are trained on trillions of tokens of text of widely varying quality. Both phases of training typically involve heuristically filtering out ``low-quality'' or \textit{noisy} training samples, yet little is known quantitatively about how the type or intensity of noise affects downstream performance. In this work, we study how noise in chain of thought (\textbf{CoT}) impacts task performance in the highly-controlled se…

@arXiv_hepph_bot@mastoxiv.page
2024-05-10 08:41:01

This https://arxiv.org/abs/2310.06611 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_hepp…

On the minimal mass of thermal dark matter and the viability of millicharged particles affecting 21cm cosmology
Thermal freeze-out offers an attractive explanation of the dark matter density free from fine-tuning of initial conditions. For dark matter with a mass below tens of MeV, photons, electrons, and neutrinos are the only available direct Standard Model annihilation products. Using a full three-sector abundance calculation, we determine the minimal mass of dark matter, allowing for an arbitrary branching into electrons/photons and neutrinos that is compatible with current cosmological observations.…

@arXiv_csCV_bot@mastoxiv.page
2024-05-10 08:29:21

This https://arxiv.org/abs/2312.03045 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCV_…

Customization Assistant for Text-to-image Generation
Customizing pre-trained text-to-image generation model has attracted massive research interest recently, due to its huge potential in real-world applications. Although existing methods are able to generate creative content for a novel concept contained in single user-input image, their capability are still far from perfection. Specifically, most existing methods require fine-tuning the generative model on testing images. Some existing methods do not require fine-tuning, while their performance …

@arXiv_eessIV_bot@mastoxiv.page
2024-03-07 07:28:27

Low-Dose CT Image Reconstruction by Fine-Tuning a UNet Pretrained for Gaussian Denoising for the Downstream Task of Image Enhancement
Tim Selig, Thomas M\"arz, Martin Storath, Andreas Weinmann
https://arxiv.org/abs/2403.03551

Low-Dose CT Image Reconstruction by Fine-Tuning a UNet Pretrained for Gaussian Denoising for the Downstream Task of Image Enhancement
Computed Tomography (CT) is a widely used medical imaging modality, and as it is based on ionizing radiation, it is desirable to minimize the radiation dose. However, a reduced radiation dose comes with reduced image quality, and reconstruction from low-dose CT (LDCT) data is still a challenging task which is subject to research. According to the LoDoPaB-CT benchmark, a benchmark for LDCT reconstruction, many state-of-the-art methods use pipelines involving UNet-type architectures. Specifically…

@arXiv_csNE_bot@mastoxiv.page
2024-04-12 06:51:16

AD-NEv : The multi-architecture neuroevolution-based multivariate anomaly detection framework
Marcin Pietro\'n, Dominik \.Zurek, Kamil Faber, Roberto Corizzo
https://arxiv.org/abs/2404.07968

AD-NEv++ : The multi-architecture neuroevolution-based multivariate anomaly detection framework
Anomaly detection tools and methods enable key analytical capabilities in modern cyberphysical and sensor-based systems. Despite the fast-paced development in deep learning architectures for anomaly detection, model optimization for a given dataset is a cumbersome and time-consuming process. Neuroevolution could be an effective and efficient solution to this problem, as a fully automated search method for learning optimal neural networks, supporting both gradient and non-gradient fine tuning. H…

@arXiv_eessAS_bot@mastoxiv.page
2024-04-11 08:34:56

This https://arxiv.org/abs/2307.07218 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Zero-shot text-to-speech (TTS) aims to synthesize voices with unseen speech prompts, which significantly reduces the data and computation requirements for voice cloning by skipping the fine-tuning process. However, the prompting mechanisms of zero-shot TTS still face challenges in the following aspects: 1) previous works of zero-shot TTS are typically trained with single-sentence prompts, which significantly restricts their performance when the data is relatively sufficient during the inference…

@arXiv_csSD_bot@mastoxiv.page
2024-04-09 08:47:55

This https://arxiv.org/abs/2402.10100 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSD_…

Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data
This study assesses deep learning models for audio classification in a clinical setting with the constraint of small datasets reflecting real-world prospective data collection. We analyze CNNs, including DenseNet and ConvNeXt, alongside transformer models like ViT, SWIN, and AST, and compare them against pre-trained audio models such as YAMNet and VGGish. Our method highlights the benefits of pre-training on large datasets before fine-tuning on specific clinical data. We prospectively collected…

@arXiv_csCV_bot@mastoxiv.page
2024-02-12 07:04:47

Iris-SAM: Iris Segmentation Using a Foundational Model
Parisa Farmanifard, Arun Ross
https://arxiv.org/abs/2402.06497 https://arxiv.o…

Iris-SAM: Iris Segmentation Using a Foundational Model
Iris segmentation is a critical component of an iris biometric system and it involves extracting the annular iris region from an ocular image. In this work, we develop a pixel-level iris segmentation model from a foundational model, viz., Segment Anything Model (SAM), that has been successfully used for segmenting arbitrary objects. The primary contribution of this work lies in the integration of different loss functions during the fine-tuning of SAM on ocular images. In particular, the importa…

@arXiv_astrophCO_bot@mastoxiv.page
2024-03-04 08:39:30

This https://arxiv.org/abs/2401.04190 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_…

Is it possible to know cosmological fine-tuning?
Fine-tuning studies whether some physical parameters, or relevant ratios between them, are located within so-called life-permitting intervals of small probability outside of which carbon-based life would not be possible. Recent developments have found estimates of these probabilities that circumvent previous concerns of measurability and selection bias. However, the question remains if fine-tuning can indeed be known. Using a mathematization of the epistemological concepts of learning and knowl…

@arXiv_csSE_bot@mastoxiv.page
2024-03-12 08:41:32

This https://arxiv.org/abs/2305.00418 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…

Using Large Language Models to Generate JUnit Tests: An Empirical Study
A code generation model generates code by taking a prompt from a code comment, existing code, or a combination of both. Although code generation models (e.g., GitHub Copilot) are increasingly being adopted in practice, it is unclear whether they can successfully be used for unit test generation without fine-tuning for a strongly typed language like Java. To fill this gap, we investigated how well three models (Codex, GPT-3.5-Turbo, and StarCoder) can generate unit tests. We used two benchmarks …

@arXiv_csIR_bot@mastoxiv.page
2024-03-28 08:29:20

This https://arxiv.org/abs/2403.16915 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…

Coarse-Tuning for Ad-hoc Document Retrieval Using Pre-trained Language Models
Fine-tuning in information retrieval systems using pre-trained language models (PLM-based IR) requires learning query representations and query-document relations, in addition to downstream task-specific learning. This study introduces coarse-tuning as an intermediate learning stage that bridges pre-training and fine-tuning. By learning query representations and query-document relations in coarse-tuning, we aim to reduce the load of fine-tuning and improve the learning effect of downstream IR t…

@arXiv_csCR_bot@mastoxiv.page
2024-05-10 06:48:07

PLLM-CS: Pre-trained Large Language Model (LLM) for Cyber Threat Detection in Satellite Networks
Mohammed Hassanin, Marwa Keshk, Sara Salim, Majid Alsubaie, Dharmendra Sharma
https://arxiv.org/abs/2405.05469

PLLM-CS: Pre-trained Large Language Model (LLM) for Cyber Threat Detection in Satellite Networks
Satellite networks are vital in facilitating communication services for various critical infrastructures. These networks can seamlessly integrate with a diverse array of systems. However, some of these systems are vulnerable due to the absence of effective intrusion detection systems, which can be attributed to limited research and the high costs associated with deploying, fine-tuning, monitoring, and responding to security breaches. To address these challenges, we propose a pretrained Large La…

@williamgunn@mastodon.social
2024-04-02 15:52:12

Wiley licenses content for training an #LLM. The company was not named, but I would suspect it's the one which has been signing a lot of licensing deals lately. Access to STM content could be a big differentiator, though I wouldn't expect it to be exclusive. Also, $23M sounds small.

A screenshot reading, "As we discussed on our investor update, our content is foundational for training and fine-tuning these models. I'm pleased to report that after the quarter closed, we executed a $23 million content rights project with a large tech company. The onetime transaction to be recorded in Q4 includes access to previously published academic and professional book content for specific use in training LLM models. We are working to uncover similar content opportunities with other AI p…

Q3 2024 John Wiley & Sons Inc Earnings Call
Q3 2024 John Wiley & Sons Inc Earnings Call

@arXiv_csCL_bot@mastoxiv.page
2024-04-11 08:30:25

This https://arxiv.org/abs/2310.05910 has been replaced.
link: https://scholar.google.com/scholar?q=a

SALMON: Self-Alignment with Instructable Reward Models
Supervised Fine-Tuning (SFT) on response demonstrations combined with Reinforcement Learning from Human Feedback (RLHF) constitutes a powerful paradigm for aligning LLM-based AI agents. However, a significant limitation of such an approach is its dependency on high-quality human annotations, making its application to intricate tasks challenging due to difficulties in obtaining consistent response demonstrations and in-distribution response preferences. This paper presents a novel approach, name…

@arXiv_eessSP_bot@mastoxiv.page
2024-05-08 08:38:16

This https://arxiv.org/abs/2405.02422 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_ees…

Precision Enhancement in Sustained Visual Attention Training Platforms: Offline EEG Signal Analysis for Classifier Fine-Tuning
In this study, a novel open-source brain-computer interface (BCI) platform was developed to decode scalp electroencephalography (EEG) signals associated with sustained attention. The EEG signal collection was conducted using a wireless headset during a sustained visual attention task, where participants were instructed to discriminate between composite images superimposed with scenes and faces, responding only to the relevant subcategory while ignoring the irrelevant ones. Seven volunteers part…

@arXiv_hepph_bot@mastoxiv.page
2024-04-10 07:02:19

Higgs Alignment from Multicritical-Point Principle in Two Higgs Doublet Models
Hikaru Kawai, Kiyoharu Kawana, Kin-ya Oda, Kei Yagyu
https://arxiv.org/abs/2404.06096

Higgs Alignment from Multicritical-Point Principle in Two Higgs Doublet Models
In models with non-minimal Higgs sectors, enforcing (near) Higgs alignment, necessary to prevent significant deviations in the Higgs boson coupling from the standard model prediction, causes a serious fine-tuning problem. We demonstrate that the Higgs alignment is naturally deduced from the multicritical point principle (MPP) in the general two Higgs doublet model. Furthermore, we discuss the possibility of realizing the Yukawa alignment from the MPP, which is necessary to prevent flavor-changi…

@arXiv_csCL_bot@mastoxiv.page
2024-03-11 08:30:21

This https://arxiv.org/abs/2312.09979 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. Increasing instruction data substantially is a direct solution to align the model with a broader range of downstream tasks or notably improve its performance on a specific task. However, we find that large-scale increases in instruction data can damage the world knowledge previously stored in LLMs. To address this cha…

@arXiv_csSE_bot@mastoxiv.page
2024-05-10 08:32:38

This https://arxiv.org/abs/2306.07285 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…

TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills
Code pre-trained models (CodePTMs) have recently demonstrated a solid capacity to process various software intelligence tasks, e.g., code clone detection, code translation, and code summarization. The current mainstream method that deploys these models to downstream tasks is to fine-tune them on individual tasks, which is generally costly and needs sufficient data for large models. To tackle the issue, in this paper, we present TransCoder, a unified Transferable fine-tuning strategy for Code re…

@arXiv_csCL_bot@mastoxiv.page
2024-03-11 07:18:49

Deep Prompt Multi-task Network for Abuse Language Detection
Jian Zhu, Yuping Ruan, Jingfei Chang, Cheng Luo
https://arxiv.org/abs/2403.05268 https://

Deep Prompt Multi-task Network for Abuse Language Detection
The detection of abusive language remains a long-standing challenge with the extensive use of social networks. The detection task of abusive language suffers from limited accuracy. We argue that the existing detection methods utilize the fine-tuning technique of the pre-trained language models (PLMs) to handle downstream tasks. Hence, these methods fail to stimulate the general knowledge of the PLMs. To address the problem, we propose a novel Deep Prompt Multi-task Network (DPMN) for abuse lang…

@arXiv_astrophCO_bot@mastoxiv.page
2024-03-04 08:39:30

This https://arxiv.org/abs/2401.04190 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_…

Is it possible to know cosmological fine-tuning?
Fine-tuning studies whether some physical parameters, or relevant ratios between them, are located within so-called life-permitting intervals of small probability outside of which carbon-based life would not be possible. Recent developments have found estimates of these probabilities that circumvent previous concerns of measurability and selection bias. However, the question remains if fine-tuning can indeed be known. Using a mathematization of the epistemological concepts of learning and knowl…

@arXiv_eessIV_bot@mastoxiv.page
2024-05-09 08:34:53

This https://arxiv.org/abs/2404.15786 has been replaced.
link: https://scholar.google.com/scholar?q=a

Rethinking Model Prototyping through the MedMNIST+ Dataset Collection
The integration of deep learning based systems in clinical practice is often impeded by challenges rooted in limited and heterogeneous medical datasets. In addition, prioritization of marginal performance improvements on a few, narrowly scoped benchmarks over clinical applicability has slowed down meaningful algorithmic progress. This trend often results in excessive fine-tuning of existing methods to achieve state-of-the-art performance on selected datasets rather than fostering clinically rel…

@arXiv_csCV_bot@mastoxiv.page
2024-04-10 07:35:33

Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion
Fan Yang, Jianfeng Zhang, Yichun Shi, Bowen Chen, Chenxu Zhang, Huichao Zhang, Xiaofeng Yang, Jiashi Feng, Guosheng Lin
https://arxiv.org/abs/2404.06429

Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion
Benefiting from the rapid development of 2D diffusion models, 3D content creation has made significant progress recently. One promising solution involves the fine-tuning of pre-trained 2D diffusion models to harness their capacity for producing multi-view images, which are then lifted into accurate 3D models via methods like fast-NeRFs or large reconstruction models. However, as inconsistency still exists and limited generated resolution, the generation results of such methods still lack intric…

@arXiv_csCL_bot@mastoxiv.page
2024-03-11 07:18:49

Deep Prompt Multi-task Network for Abuse Language Detection
Jian Zhu, Yuping Ruan, Jingfei Chang, Cheng Luo
https://arxiv.org/abs/2403.05268 https://

Deep Prompt Multi-task Network for Abuse Language Detection
The detection of abusive language remains a long-standing challenge with the extensive use of social networks. The detection task of abusive language suffers from limited accuracy. We argue that the existing detection methods utilize the fine-tuning technique of the pre-trained language models (PLMs) to handle downstream tasks. Hence, these methods fail to stimulate the general knowledge of the PLMs. To address the problem, we propose a novel Deep Prompt Multi-task Network (DPMN) for abuse lang…

@arXiv_csIR_bot@mastoxiv.page
2024-03-26 06:50:36

Coarse-Tuning for Ad-hoc Document Retrieval Using Pre-trained Language Models
Atsushi Keyaki, Ribeka Keyaki
https://arxiv.org/abs/2403.16915 https://

Coarse-Tuning for Ad-hoc Document Retrieval Using Pre-trained Language Models
Fine-tuning in information retrieval systems using pre-trained language models (PLM-based IR) requires learning query representations and query-document relations, in addition to downstream task-specific learning. This study introduces coarse-tuning as an intermediate learning stage that bridges pre-training and fine-tuning. By learning query representations and query-document relations in coarse-tuning, we aim to reduce the load of fine-tuning and improve the learning effect of downstream IR t…

@arXiv_csCL_bot@mastoxiv.page
2024-03-11 07:18:46

Cross-lingual Transfer or Machine Translation? On Data Augmentation for Monolingual Semantic Textual Similarity
Sho Hoshino, Akihiko Kato, Soichiro Murakami, Peinan Zhang
https://arxiv.org/abs/2403.05257

Cross-lingual Transfer or Machine Translation? On Data Augmentation for Monolingual Semantic Textual Similarity
Learning better sentence embeddings leads to improved performance for natural language understanding tasks including semantic textual similarity (STS) and natural language inference (NLI). As prior studies leverage large-scale labeled NLI datasets for fine-tuning masked language models to yield sentence embeddings, task performance for languages other than English is often left behind. In this study, we directly compared two data augmentation techniques as potential solutions for monolingual ST…

@arXiv_csCR_bot@mastoxiv.page
2024-02-26 06:47:59

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
Jiongxiao Wang, Jiazhao Li, Yiquan Li, Xiangyu Qi, Muhao Chen, Junjie Hu, Yixuan Li, Bo Li, Chaowei Xiao
https://arxiv.org/abs/2402.14968

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
Despite the general capabilities of Large Language Models (LLMs) like GPT-4 and Llama-2, these models still request fine-tuning or adaptation with customized data when it comes to meeting the specific business demands and intricacies of tailored use cases. However, this process inevitably introduces new safety threats, particularly against the Fine-tuning based Jailbreak Attack (FJAttack), where incorporating just a few harmful examples into the fine-tuning dataset can significantly compromise …

@arXiv_csCL_bot@mastoxiv.page
2024-03-11 07:18:46

Cross-lingual Transfer or Machine Translation? On Data Augmentation for Monolingual Semantic Textual Similarity
Sho Hoshino, Akihiko Kato, Soichiro Murakami, Peinan Zhang
https://arxiv.org/abs/2403.05257

Cross-lingual Transfer or Machine Translation? On Data Augmentation for Monolingual Semantic Textual Similarity
Learning better sentence embeddings leads to improved performance for natural language understanding tasks including semantic textual similarity (STS) and natural language inference (NLI). As prior studies leverage large-scale labeled NLI datasets for fine-tuning masked language models to yield sentence embeddings, task performance for languages other than English is often left behind. In this study, we directly compared two data augmentation techniques as potential solutions for monolingual ST…

@arXiv_csCV_bot@mastoxiv.page
2024-04-10 07:35:46

Can Feedback Enhance Semantic Grounding in Large Vision-Language Models?
Yuan-Hong Liao, Rafid Mahmood, Sanja Fidler, David Acuna
https://arxiv.org/abs/2404.06510

Can Feedback Enhance Semantic Grounding in Large Vision-Language Models?
Enhancing semantic grounding abilities in Vision-Language Models (VLMs) often involves collecting domain-specific training data, refining the network architectures, or modifying the training recipes. In this work, we venture into an orthogonal direction and explore whether VLMs can improve their semantic grounding by "receiving" feedback, without requiring in-domain data, fine-tuning, or modifications to the network architectures. We systematically analyze this hypothesis using a feedback mecha…

@arXiv_csNE_bot@mastoxiv.page
2024-04-09 08:46:16

This https://arxiv.org/abs/2304.04067 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csNE_…

Efficiently Tackling Million-Dimensional Multiobjective Problems: A Direction Sampling and Fine-Tuning Approach
We define very large-scale multiobjective optimization problems as optimizing multiple objectives (VLSMOPs) with more than 100,000 decision variables. These problems hold substantial significance, given the ubiquity of real-world scenarios necessitating the optimization of hundreds of thousands, if not millions, of variables. However, the larger dimension in VLSMOPs intensifies the curse of dimensionality and poses significant challenges for existing large-scale evolutionary multiobjective algo…

@arXiv_csCL_bot@mastoxiv.page
2024-02-12 08:30:24

This https://arxiv.org/abs/2305.18582 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Information Association for Language Model Updating by Mitigating LM-Logical Discrepancy
Large Language Models~(LLMs) struggle with providing current information due to the outdated pre-training data. Existing methods for updating LLMs, such as knowledge editing and continual fine-tuning, have significant drawbacks in generalizability of new information and the requirements on structured updating corpus. We identify the core challenge behind these drawbacks: the LM-logical discrepancy featuring the difference between language modeling probabilities and logical probabilities. To eva…

@arXiv_csCL_bot@mastoxiv.page
2024-02-12 08:30:24

This https://arxiv.org/abs/2305.18582 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Information Association for Language Model Updating by Mitigating LM-Logical Discrepancy
Large Language Models~(LLMs) struggle with providing current information due to the outdated pre-training data. Existing methods for updating LLMs, such as knowledge editing and continual fine-tuning, have significant drawbacks in generalizability of new information and the requirements on structured updating corpus. We identify the core challenge behind these drawbacks: the LM-logical discrepancy featuring the difference between language modeling probabilities and logical probabilities. To eva…

@arXiv_csSE_bot@mastoxiv.page
2024-04-03 06:52:52

FT2Ra: A Fine-Tuning-Inspired Approach to Retrieval-Augmented Code Completion
Qi Guo, Xiaohong Li, Xiaofei Xie, Shangqing Liu, Ze Tang, Ruitao Feng, Junjie Wang, Jidong Ge, Lei Bu
https://arxiv.org/abs/2404.01554

FT2Ra: A Fine-Tuning-Inspired Approach to Retrieval-Augmented Code Completion
The rise of code pre-trained models has significantly enhanced various coding tasks, such as code completion, and tools like GitHub Copilot. However, the substantial size of these models, especially large models, poses a significant challenge when it comes to fine-tuning them for specific downstream tasks. As an alternative approach, retrieval-based methods have emerged as a promising solution, augmenting model predictions without the need for fine-tuning. Despite their potential, a significant…

@arXiv_csLG_bot@mastoxiv.page
2024-04-30 09:09:07

This https://arxiv.org/abs/2404.11536 has been replaced.
link: https://scholar.google.com/scholar?q=a

FedPFT: Federated Proxy Fine-Tuning of Foundation Models
Adapting Foundation Models (FMs) for downstream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine-tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumulations of gradients. In this paper, we propose Federated Proxy Fine-Tuning (FedPFT), a novel method enhancing FMs adaptation in downstream tasks through FL by two key modu…

@arXiv_csCR_bot@mastoxiv.page
2024-02-29 08:31:40

This https://arxiv.org/abs/2402.14968 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
Despite the general capabilities of Large Language Models (LLMs) like GPT-4 and Llama-2, these models still request fine-tuning or adaptation with customized data when it comes to meeting the specific business demands and intricacies of tailored use cases. However, this process inevitably introduces new safety threats, particularly against the Fine-tuning based Jailbreak Attack (FJAttack), where incorporating just a few harmful examples into the fine-tuning dataset can significantly compromise …

@arXiv_csIR_bot@mastoxiv.page
2024-03-27 08:25:29

This https://arxiv.org/abs/2403.16915 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…

Coarse-Tuning for Ad-hoc Document Retrieval Using Pre-trained Language Models
Fine-tuning in information retrieval systems using pre-trained language models (PLM-based IR) requires learning query representations and query-document relations, in addition to downstream task-specific learning. This study introduces coarse-tuning as an intermediate learning stage that bridges pre-training and fine-tuning. By learning query representations and query-document relations in coarse-tuning, we aim to reduce the load of fine-tuning and improve the learning effect of downstream IR t…

@arXiv_csCL_bot@mastoxiv.page
2024-03-11 07:19:05

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
James Chua, Edward Rees, Hunar Batra, Samuel R. Bowman, Julian Michael, Ethan Perez, Miles Turpin
https://arxiv.org/abs/2403.05518

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
While chain-of-thought prompting (CoT) has the potential to improve the explainability of language model reasoning, it can systematically misrepresent the factors influencing models' behavior--for example, rationalizing answers in line with a user's opinion without mentioning this bias. To mitigate this biased reasoning problem, we introduce bias-augmented consistency training (BCT), an unsupervised fine-tuning scheme that trains models to give consistent reasoning across prompts with and witho…

@arXiv_csCL_bot@mastoxiv.page
2024-03-11 07:19:05

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
James Chua, Edward Rees, Hunar Batra, Samuel R. Bowman, Julian Michael, Ethan Perez, Miles Turpin
https://arxiv.org/abs/2403.05518

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
While chain-of-thought prompting (CoT) has the potential to improve the explainability of language model reasoning, it can systematically misrepresent the factors influencing models' behavior--for example, rationalizing answers in line with a user's opinion without mentioning this bias. To mitigate this biased reasoning problem, we introduce bias-augmented consistency training (BCT), an unsupervised fine-tuning scheme that trains models to give consistent reasoning across prompts with and witho…

@arXiv_csCL_bot@mastoxiv.page
2024-02-12 08:30:22

This https://arxiv.org/abs/2305.13179 has been replaced.
link: https://scholar.google.com/scholar?q=a

Teaching Probabilistic Logical Reasoning to Transformers
In this paper, we evaluate the capability of transformer-based language models in making inferences over uncertain text that includes uncertain rules of reasoning. We cover both Pre-trained Language Models (PLMs) and generative Large Language Models (LLMs). Our evaluation results show that both generations of language models struggle with reasoning over uncertain text. We propose a novel end-to-end fine-tuning approach, Probabilistic Constraint Training (PCT), that utilizes probabilistic logica…

@arXiv_csNE_bot@mastoxiv.page
2024-05-09 06:52:55

Learning-to-learn enables rapid learning with phase-change memory-based in-memory computing
Thomas Ortner, Horst Petschenig, Athanasios Vasilopoulos, Roland Renner, \v{S}pela Brglez, Thomas Limbacher, Enrique Pi\~nero, Alejandro Linares Barranco, Angeliki Pantazi, Robert Legenstein
https://arxiv.org/abs/2405.05141

Learning-to-learn enables rapid learning with phase-change memory-based in-memory computing
There is a growing demand for low-power, autonomously learning artificial intelligence (AI) systems that can be applied at the edge and rapidly adapt to the specific situation at deployment site. However, current AI models struggle in such scenarios, often requiring extensive fine-tuning, computational resources, and data. In contrast, humans can effortlessly adjust to new tasks by transferring knowledge from related ones. The concept of learning-to-learn (L2L) mimics this process and enables A…

@arXiv_csSE_bot@mastoxiv.page
2024-05-06 07:25:00

Empirical Studies of Parameter Efficient Methods for Large Language Models of Code and Knowledge Transfer to R
Amirreza Esmaeili, Iman Saberi, Fatemeh H. Fard
https://arxiv.org/abs/2405.01553

Empirical Studies of Parameter Efficient Methods for Large Language Models of Code and Knowledge Transfer to R
Recently, Large Langauge Models (LLMs) have gained a lot of attention in the Software Engineering (SE) community. LLMs or their variants pre-trained on code are used for many SE tasks. A main approach for adapting LLMs to the downstream task is to fine-tune the models. However, with having billions-parameters-LLMs, fine-tuning the models is not practical. An alternative approach is using Parameter Efficient Fine Tuning (PEFT), in which the model parameters are frozen and only a few added parame…

@arXiv_csCL_bot@mastoxiv.page
2024-02-12 08:30:22

This https://arxiv.org/abs/2305.13179 has been replaced.
link: https://scholar.google.com/scholar?q=a

Teaching Probabilistic Logical Reasoning to Transformers
In this paper, we evaluate the capability of transformer-based language models in making inferences over uncertain text that includes uncertain rules of reasoning. We cover both Pre-trained Language Models (PLMs) and generative Large Language Models (LLMs). Our evaluation results show that both generations of language models struggle with reasoning over uncertain text. We propose a novel end-to-end fine-tuning approach, Probabilistic Constraint Training (PCT), that utilizes probabilistic logica…

@arXiv_csCL_bot@mastoxiv.page
2024-04-08 06:48:27

Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation
Tong Su, Xin Peng, Sarubi Thillainathan, David Guzm\'an, Surangika Ranathunga, En-Shiun Annie Lee
https://arxiv.org/abs/2404.04212

Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation
Parameter-efficient fine-tuning (PEFT) methods are increasingly vital in adapting large-scale pre-trained language models for diverse tasks, offering a balance between adaptability and computational efficiency. They are important in Low-Resource Language (LRL) Neural Machine Translation (NMT) to enhance translation accuracy with minimal resources. However, their practical effectiveness varies significantly across different languages. We conducted comprehensive empirical experiments with varying…

@arXiv_csIR_bot@mastoxiv.page
2024-02-28 06:50:23

A Fine-tuning Enhanced RAG System with Quantized Influence Measure as AI Judge
Keshav Rangan, Yiqiao Yin
https://arxiv.org/abs/2402.17081 https://

A Fine-tuning Enhanced RAG System with Quantized Influence Measure as AI Judge
This study presents an innovative enhancement to retrieval-augmented generation (RAG) systems by seamlessly integrating fine-tuned large language models (LLMs) with vector databases. This integration capitalizes on the combined strengths of structured data retrieval and the nuanced comprehension provided by advanced LLMs. Central to our approach are the LoRA and QLoRA methodologies, which stand at the forefront of model refinement through parameter-efficient fine-tuning and memory optimization.…

@arXiv_csCR_bot@mastoxiv.page
2024-04-01 08:29:56

This https://arxiv.org/abs/2402.12168 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…

Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning
Recently, various parameter-efficient fine-tuning (PEFT) strategies for application to language models have been proposed and successfully implemented. However, this raises the question of whether PEFT, which only updates a limited set of model parameters, constitutes security vulnerabilities when confronted with weight-poisoning backdoor attacks. In this study, we show that PEFT is more susceptible to weight-poisoning backdoor attacks compared to the full-parameter fine-tuning method, with pre…

@arXiv_csLG_bot@mastoxiv.page
2024-04-24 08:49:22

This https://arxiv.org/abs/2404.14367 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Learning from preference labels plays a crucial role in fine-tuning large language models. There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learning. Different methods come with different implementation tradeoffs and performance differences, and existing empirical findings present different conclusions, for instance, some results show that online RL is quite important to attain good fine-tunin…

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 06:48:50

Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages
Sankalp Bahad, Pruthwik Mishra, Karunesh Arora, Rakesh Chandra Balabantaray, Dipti Misra Sharma, Parameswari Krishnamurthy
https://arxiv.org/abs/2405.04829

Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages
Named Entity Recognition (NER) is a useful component in Natural Language Processing (NLP) applications. It is used in various tasks such as Machine Translation, Summarization, Information Retrieval, and Question-Answering systems. The research on NER is centered around English and some other major languages, whereas limited attention has been given to Indian languages. We analyze the challenges and propose techniques that can be tailored for Multilingual Named Entity Recognition for Indian Lang…

@arXiv_csCL_bot@mastoxiv.page
2024-05-09 06:48:50

Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages
Sankalp Bahad, Pruthwik Mishra, Karunesh Arora, Rakesh Chandra Balabantaray, Dipti Misra Sharma, Parameswari Krishnamurthy
https://arxiv.org/abs/2405.04829

Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages
Named Entity Recognition (NER) is a useful component in Natural Language Processing (NLP) applications. It is used in various tasks such as Machine Translation, Summarization, Information Retrieval, and Question-Answering systems. The research on NER is centered around English and some other major languages, whereas limited attention has been given to Indian languages. We analyze the challenges and propose techniques that can be tailored for Multilingual Named Entity Recognition for Indian Lang…

@arXiv_csSE_bot@mastoxiv.page
2024-04-09 06:53:22

Constraining Large Language Model for Generating Computer-Parsable Content
Jiaye Wang
https://arxiv.org/abs/2404.05499 https://arxiv.…

Constraining Large Language Model for Generating Computer-Parsable Content
We propose a method to guide Large Language Models (LLMs) in generating structured content adhering to specific conventions without fine-tuning. By utilizing coroutine-based content generation constraints through a pre-agreed context-free grammar (CFG), LLMs are directed during decoding to produce formal language compliant outputs. This enhances stability and consistency in generating target data structures, types, or instructions, reducing application development complexities. Experimentally, …

@arXiv_csCR_bot@mastoxiv.page
2024-02-27 08:19:06

This https://arxiv.org/abs/2310.09266 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…

User Inference Attacks on Large Language Models
Fine-tuning is a common and effective method for tailoring large language models (LLMs) to specialized tasks and applications. In this paper, we study the privacy implications of fine-tuning LLMs on user data. To this end, we consider a realistic threat model, called user inference, wherein an attacker infers whether or not a user's data was used for fine-tuning. We design attacks for performing user inference that require only black-box access to the fine-tuned LLM and a few samples from a use…

@arXiv_csCV_bot@mastoxiv.page
2024-02-29 08:34:37

This https://arxiv.org/abs/2402.17412 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCV_…

DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model
In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements. While the low-rank adaptation (LoRA) module within DreamBooth offers a reduction in trainable parameters, it introduces a pronounced sensitivity to hyperparameters, leading to a compromise between parameter efficiency and the quality …

@arXiv_csCL_bot@mastoxiv.page
2024-03-08 08:29:41

This https://arxiv.org/abs/2403.01432 has been replaced.
link: https://scholar.google.com/scholar?q=a

Fine Tuning vs. Retrieval Augmented Generation for Less Popular Knowledge
Large language models (LLMs) memorize a vast amount of factual knowledge, exhibiting strong performance across diverse tasks and domains. However, it has been observed that the performance diminishes when dealing with less-popular or low-frequency concepts and entities, for example in domain specific applications. The two prominent approaches to enhance the performance of LLMs on low-frequent topics are: Retrieval Augmented Generation (RAG) and fine-tuning (FT) over synthetic data. This paper e…

@arXiv_csSE_bot@mastoxiv.page
2024-05-03 08:47:43

This https://arxiv.org/abs/2402.00905 has been replaced.
link: https://scholar.google.com/scholar?q=a

Fine-Tuning and Prompt Engineering for Large Language Models-based Code Review Automation
Context: The rapid evolution of Large Language Models (LLMs) has sparked significant interest in leveraging their capabilities for automating code review processes. Prior studies often focus on developing LLMs for code review automation, yet require expensive resources, which is infeasible for organizations with limited budgets and resources. Thus, fine-tuning and prompt engineering are the two common approaches to leveraging LLMs for code review automation. Objective: We aim to investigate the…

@arXiv_csCL_bot@mastoxiv.page
2024-05-03 07:15:29

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Justin Zhao, Timothy Wang, Wael Abid, Geoffrey Angus, Arnav Garg, Jeffery Kinnison, Alex Sherstinsky, Piero Molino, Travis Addair, Devvret Rishi
https://arxiv.org/abs/2405.00732

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Low Rank Adaptation (LoRA) has emerged as one of the most widely adopted methods for Parameter Efficient Fine-Tuning (PEFT) of Large Language Models (LLMs). LoRA reduces the number of trainable parameters and memory usage while achieving comparable performance to full fine-tuning. We aim to assess the viability of training and serving LLMs fine-tuned with LoRA in real-world applications. First, we measure the quality of LLMs fine-tuned with quantized low rank adapters across 10 base models and …

@arXiv_csSE_bot@mastoxiv.page
2024-04-22 06:52:51

Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs
Boyang Yang, Haoye Tian, Jiadong Ren, Hongyu Zhang, Jacques Klein, Tegawend\'e F. Bissyand\'e, Claire Le Goues, Shunfu Jin
https://arxiv.org/abs/2404.12636

Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs
Large language models (LLMs) have demonstrated remarkable capabilities on a broad spectrum of downstream tasks. Within the realm of software engineering, specialized tasks on code, such as program repair, present unique challenges, necessitating fine-tuning to unlock state-of-the-art performance. Fine-tuning approaches proposed in the literature for LLMs on program repair tasks are however generally overlooking the need to reason about the logic behind code changes, beyond syntactic patterns in…

@arXiv_csSE_bot@mastoxiv.page
2024-05-02 08:30:20

This https://arxiv.org/abs/2402.00905 has been replaced.
link: https://scholar.google.com/scholar?q=a

Fine-Tuning and Prompt Engineering for Large Language Models-based Code Review Automation
Context: The rapid evolution of Large Language Models (LLMs) has sparked significant interest in leveraging their capabilities for automating code review processes. Prior studies often focus on developing LLMs for code review automation, yet require expensive resources, which is infeasible for organizations with limited budgets and resources. Thus, fine-tuning and prompt engineering are the two common approaches to leveraging LLMs for code review automation. Objective: We aim to investigate the…

@arXiv_csCL_bot@mastoxiv.page
2024-04-05 08:31:09

This https://arxiv.org/abs/2403.20145 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Fine-tuning Large Language Models for Automated Diagnostic Screening Summaries
Improving mental health support in developing countries is a pressing need. One potential solution is the development of scalable, automated systems to conduct diagnostic screenings, which could help alleviate the burden on mental health professionals. In this work, we evaluate several state-of-the-art Large Language Models (LLMs), with and without fine-tuning, on our custom dataset for generating concise summaries from mental state examinations. We rigorously evaluate four different models for…

@arXiv_csCL_bot@mastoxiv.page
2024-02-23 06:57:17

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, David Bau
https://arxiv.org/abs/2402.14811

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Fine-tuning on generalized tasks such as instruction following, code generation, and mathematics has been shown to enhance language models' performance on a range of tasks. Nevertheless, explanations of how such fine-tuning influences the internal computations in these models remain elusive. We study how fine-tuning affects the internal mechanisms implemented in language models. As a case study, we explore the property of entity tracking, a crucial facet of language comprehension, where models …

@arXiv_csCL_bot@mastoxiv.page
2024-05-06 08:26:43

This https://arxiv.org/abs/2403.09891 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Fisher Mask Nodes for Language Model Merging
Fine-tuning pre-trained models provides significant advantages in downstream performance. The ubiquitous nature of pre-trained models such as BERT and its derivatives in natural language processing has also led to a proliferation of task-specific fine-tuned models. As these models typically only perform one task well, additional training or ensembling is required in multi-task scenarios. The growing field of model merging provides a solution, dealing with the challenge of combining multiple tas…

@arXiv_csCL_bot@mastoxiv.page
2024-03-01 06:53:46

OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models
Jenish Maharjan, Anurag Garikipati, Navan Preet Singh, Leo Cyrus, Mayank Sharma, Madalina Ciobanu, Gina Barnes, Rahul Thapa, Qingqing Mao, Ritankar Das
https://arxiv.org/abs/2402.19371

OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models
LLMs have become increasingly capable at accomplishing a range of specialized-tasks and can be utilized to expand equitable access to medical knowledge. Most medical LLMs have involved extensive fine-tuning, leveraging specialized medical data and significant, thus costly, amounts of computational power. Many of the top performing LLMs are proprietary and their access is limited to very few research groups. However, open-source (OS) models represent a key area of growth for medical LLMs due to …

@arXiv_csCL_bot@mastoxiv.page
2024-03-04 07:27:14

Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish
Recep Firat Cekinel, Pinar Karagoz, Cagri Coltekin
https://arxiv.org/abs/2403.00411

Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish
The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates eviden…

@arXiv_csCL_bot@mastoxiv.page
2024-02-29 06:50:44

Learning or Self-aligning? Rethinking Instruction Fine-tuning
Mengjie Ren, Boxi Cao, Hongyu Lin, Liu Cao, Xianpei Han, Ke Zeng, Guanglu Wan, Xunliang Cai, Le Sun
https://arxiv.org/abs/2402.18243

Learning or Self-aligning? Rethinking Instruction Fine-tuning
Instruction Fine-tuning~(IFT) is a critical phase in building large language models~(LLMs). Previous works mainly focus on the IFT's role in the transfer of behavioral norms and the learning of additional world knowledge. However, the understanding of the underlying mechanisms of IFT remains significantly limited. In this paper, we design a knowledge intervention framework to decouple the potential underlying factors of IFT, thereby enabling individual analysis of different factors. Surprisingl…

@arXiv_csCL_bot@mastoxiv.page
2024-04-04 08:33:51

This https://arxiv.org/abs/2404.00213 has been replaced.
link: https://scholar.google.com/scholar?q=a

Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning
In recent years, Large Language Models (LLMs) have shown remarkable performance in generating human-like text, proving to be a valuable asset across various applications. However, adapting these models to incorporate new, out-of-domain knowledge remains a challenge, particularly for facts and events that occur after the model's knowledge cutoff date. This paper investigates the effectiveness of Supervised Fine-Tuning (SFT) as a method for knowledge injection in LLMs, specifically focusing on th…

@arXiv_csCL_bot@mastoxiv.page
2024-04-08 06:48:04

Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models
Bowen Zhang, Kehua Chang, Chunping Li
https://arxiv.org/abs/2404.03921 ht…

Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models
Sentence Embedding stands as a fundamental task within the realm of Natural Language Processing, finding extensive application in search engines, expert systems, and question-and-answer platforms. With the continuous evolution of large language models such as LLaMA and Mistral, research on sentence embedding has recently achieved notable breakthroughs. However, these advancements mainly pertain to fine-tuning scenarios, leaving explorations into computationally efficient direct inference method…

@arXiv_csCL_bot@mastoxiv.page
2024-03-07 08:24:50

This https://arxiv.org/abs/2309.13734 has been replaced.
link: https://scholar.google.com/scholar?q=a

Prompting and Fine-Tuning Open-Sourced Large Language Models for Stance Classification
Stance classification, the task of predicting the viewpoint of an author on a subject of interest, has long been a focal point of research in domains ranging from social science to machine learning. Current stance detection methods rely predominantly on manual annotation of sentences, followed by training a supervised machine learning model. However, this manual annotation process requires laborious annotation effort, and thus hampers its potential to generalize across different contexts. In th…

@arXiv_csCL_bot@mastoxiv.page
2024-02-29 06:50:58

Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of Pre-trained Language Models with Proximal Policy Optimization
Shuo Yang, Gjergji Kasneci
https://arxiv.org/abs/2402.18284

Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of Pre-trained Language Models with Proximal Policy Optimization
Wide usage of ChatGPT has highlighted the potential of reinforcement learning from human feedback. However, its training pipeline relies on manual ranking, a resource-intensive process. To reduce labor costs, we propose a self-supervised text ranking approach for applying Proximal-Policy-Optimization to fine-tune language models while eliminating the need for human annotators. Our method begins with probabilistic sampling to encourage a language model to generate diverse responses for each inpu…

@arXiv_csCL_bot@mastoxiv.page
2024-03-29 08:32:05

This https://arxiv.org/abs/2403.18025 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCL_…

Improving Pre-trained Language Model Sensitivity via Mask Specific losses: A case study on Biomedical NER
Adapting language models (LMs) to novel domains is often achieved through fine-tuning a pre-trained LM (PLM) on domain-specific data. Fine-tuning introduces new knowledge into an LM, enabling it to comprehend and efficiently perform a target domain task. Fine-tuning can however be inadvertently insensitive if it ignores the wide array of disparities (e.g in word meaning) between source and target domains. For instance, words such as chronic and pressure may be treated lightly in social conversa…

@arXiv_csCL_bot@mastoxiv.page
2024-02-23 06:55:57

INSTRAUG: Automatic Instruction Augmentation for Multimodal Instruction Fine-tuning
Wei Han, Hui Chen, Soujanya Poria
https://arxiv.org/abs/2402.14492 http…

INSTRAUG: Automatic Instruction Augmentation for Multimodal Instruction Fine-tuning
Fine-tuning large language models (LLMs) on multi-task instruction-following data has been proven to be a powerful learning paradigm for improving their zero-shot capabilities on new tasks. Recent works about high-quality instruction-following data generation and selection require amounts of human labor to conceive model-understandable instructions for the given tasks and carefully filter the LLM-generated data. In this work, we introduce an automatic instruction augmentation method named INSTR…

Tootfinder

Opt-in global Mastodon full text search. Join the index!